Intro to Geopatial Data

Geospatial Data can be defined as any data that is linked to a specific location on the surface of the Earth.

A lot of geospatial data is publicly available online.

{sf}

The sf package provides simple features access for R. Simple features is a common storage and access model of mostly two-dimensional geometries (like points, lines, and polygons) used by geographic information systems.

You can convert data associated with a specific latitude and longitude to an sf object, which is similar to a data frame, but with extra information about the geometry.

raw_data <- read_csv("simulated_case_locations.csv") 

library(sf)
d_events <- st_as_sf(raw_data, coords = c('X', 'Y')) %>% 
    st_set_crs(4326)

d_events
## Simple feature collection with 5227 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -84.81942 ymin: 39.04028 xmax: -84.25902 ymax: 39.31012
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5,227 x 2
##       id             geometry
##    <dbl>          <POINT [°]>
##  1     1 (-84.52842 39.10935)
##  2     2  (-84.5291 39.10753)
##  3     3 (-84.53005 39.10966)
##  4     4 (-84.52883 39.10792)
##  5     5 (-84.53021 39.10781)
##  6     6 (-84.53026 39.10672)
##  7     7  (-84.5315 39.10842)
##  8     8 (-84.52977 39.10774)
##  9     9 (-84.52776 39.11035)
## 10    10 (-84.52991 39.10767)
## # … with 5,217 more rows

{tigris}

The tigris package allows a user to directly download and use TIGER/Line shapefiles (including boundaries for states, counties, census tracts, etc.) from the U.S. Census Bureau in R.

For example, we can download the shapefile for the census tracts in Hamilton County, Ohio.

hamilton_tracts <- tigris::tracts(state = '39', county = '61')

We can perform a spatial join using st_join() from the sf package to overlay the points from our event data into their census tracts (i.e., “assign” each point to its census tract). Then we can calculate the number of events in each tract using group_by() and summarize() from the tidyverse.

d_events <- st_join(d_events, hamilton_tracts)

d_tracts <- d_events %>% 
    group_by(GEOID) %>% 
    summarize(n_events = n()) %>% 
    st_set_geometry(NULL)
## # A tibble: 222 x 2
##    GEOID       n_events
##    <chr>          <int>
##  1 39061000200       12
##  2 39061000700        3
##  3 39061000900       13
##  4 39061001000        9
##  5 39061001100       13
##  6 39061001600       11
##  7 39061001700       16
##  8 39061001800        6
##  9 39061001900        3
## 10 39061002000        1
## # … with 212 more rows

{tidycensus}

The tidycensus package interfaces with the US Census Bureau’s decennial Census and five-year American Community APIs and returns tidyverse-ready data frames.

For example, we can download the population under 18 of each census tract in Hamilton County.

d_pop <- tidycensus::get_acs(geography = 'tract',
                             variables = 'B09001_001E',
                             year = 2016,
                             state = 39, county = 61,
                             geometry = TRUE) %>%
    select(GEOID, n_children = estimate)
d_pop
## Simple feature collection with 222 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.82016 ymin: 39.02208 xmax: -84.25651 ymax: 39.31206
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
## First 10 features:
##          GEOID n_children                       geometry
## 1  39061000200        234 MULTIPOLYGON (((-84.53186 3...
## 2  39061000700         80 MULTIPOLYGON (((-84.51926 3...
## 3  39061000900        325 MULTIPOLYGON (((-84.52104 3...
## 4  39061001000        225 MULTIPOLYGON (((-84.51595 3...
## 5  39061001100        351 MULTIPOLYGON (((-84.5109 39...
## 6  39061001600        259 MULTIPOLYGON (((-84.52619 3...
## 7  39061001700        348 MULTIPOLYGON (((-84.51742 3...
## 8  39061001800        159 MULTIPOLYGON (((-84.51075 3...
## 9  39061001900        124 MULTIPOLYGON (((-84.50007 3...
## 10 39061002000        111 MULTIPOLYGON (((-84.4879 39...

We can use this information to calculate the event rate per 1000 children in each tract.

d <- left_join(d_pop, d_tracts, by = 'GEOID') %>% 
    mutate(event_rate = n_events / n_children * 1000)
d
## Simple feature collection with 222 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.82016 ymin: 39.02208 xmax: -84.25651 ymax: 39.31206
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
## First 10 features:
##          GEOID n_children n_events                       geometry
## 1  39061000200        234       12 MULTIPOLYGON (((-84.53186 3...
## 2  39061000700         80        3 MULTIPOLYGON (((-84.51926 3...
## 3  39061000900        325       13 MULTIPOLYGON (((-84.52104 3...
## 4  39061001000        225        9 MULTIPOLYGON (((-84.51595 3...
## 5  39061001100        351       13 MULTIPOLYGON (((-84.5109 39...
## 6  39061001600        259       11 MULTIPOLYGON (((-84.52619 3...
## 7  39061001700        348       16 MULTIPOLYGON (((-84.51742 3...
## 8  39061001800        159        6 MULTIPOLYGON (((-84.51075 3...
## 9  39061001900        124        3 MULTIPOLYGON (((-84.50007 3...
## 10 39061002000        111        1 MULTIPOLYGON (((-84.4879 39...
##    event_rate
## 1   51.282051
## 2   37.500000
## 3   40.000000
## 4   40.000000
## 5   37.037037
## 6   42.471042
## 7   45.977011
## 8   37.735849
## 9   24.193548
## 10   9.009009

Mapping

There are many different ways to visualize geospatial data in R.

First, we could use the plotting features of base R.

plot(d['event_rate'])

{ggplot2} and geom_sf()

Using ggplot give us more flexibility and potential for customization, and is comfortable for those already familiar with ggplot functionality.

ggplot() +
  geom_sf(data=d, aes(fill=event_rate)) +
  scale_fill_viridis_c() +
  labs(fill="Event Rate") +
  ggthemes::theme_map() +
  theme(legend.position = c(0.9, 0))

{mapview}

The mapview package is useful for quick and convenient interactive visualisations of spatial data. It was created to fill the gap of quick (not presentation grade) interactive plotting to examine and visually investigate both aspects of spatial data, the geometries and their attributes.

mapview::mapview(d, zcol='event_rate')

{tmap} : Static Mode

The tmap package allows the user to thematic maps with great flexibility. The syntax for creating plots is similar to that of ggplot2, but tailored to maps.

tmap has two modes. plot mode is useful for publication-ready static maps.

library(tmap)
tmap_mode(mode='plot')

tm1 <- tm_shape(d) +
          tm_fill(col='event_rate', palette='viridis', title="Event Rate") +
       tm_shape(d) + 
          tm_borders() +
       tm_layout(legend.position = c(0.9, 0),
            frame = FALSE) +
       tm_scale_bar(position = c(0,0)) +
       tm_compass(position = c(0.05, 0.1), size = 3, lwd=1)

{tmap} : Interactive Mode

We can change tmap_mode to view and print interactively view the static map we created in the previous step. This is similar to mapview but has more customizability.

tmap_mode(mode='view')

Bivariate Mapping

More {tmap}

tmap_mode("plot")
tm2 <- tm_shape(d) +
          tm_polygons(c('event_rate', 'dep_index'),
                      palette = list('Blues', 'Reds'),
                      title=list("Event Rate", "Deprivation Index"),
                      alpha=0.8) +
       tm_facets(sync = TRUE, ncol=1) +
       tm_layout(legend.position = c(0.9, 0),
            frame = FALSE) +
       tm_scale_bar(position = c(0,0)) +
       tm_compass(position = c(0.05, 0.1), size = 3, lwd=1)
tm2

tmap_mode("view")
tm2

{biscale}

The biscale package uses features of ggplot2 to create bivariate choropleth maps.

library(biscale)
d_biscale <- bi_class(d, x = dep_index, y = event_rate, style = "quantile", dim = 3)

map <- ggplot() +
  geom_sf(data = d_biscale, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
  bi_scale_fill(pal = "DkViolet", dim = 3) +
  bi_theme() +
  theme(title = element_text(size=15))

legend <- bi_legend(pal = "DkViolet",
                    dim = 3,
                    xlab = "Higher Deprivation",
                    ylab = "Higher Event Rate ",
                    size = 8)
library(cowplot)
b1 <- ggdraw() +
  draw_plot(map, 0, 0, 1, 1) +
  draw_plot(legend, 0, 0, 0.2, 0.5, vjust=0.1, hjust=-0.5)